# High-Fidelity Audio

Llasa 3B
Llasa is a text-to-speech (TTS) system based on LLaMA, which extends the capabilities of the language model by integrating speech tokens, supporting Chinese and English speech generation.
Speech Synthesis Supports Multiple Languages
L
unsloth
55
1
Handler
MIT
Bark is a Transformer-based text-to-audio model created by Suno, capable of generating highly realistic multilingual speech, music, background noise, and sound effects.
Speech Synthesis Supports Multiple Languages
H
walterheart
20
0
Stable Audio Open 1.0 Music
Other
Stable Audio Tools is a text-to-audio model capable of generating high-quality audio content based on text descriptions.
Audio Generation English
S
Nekochu
62
3
F5 TTS German
F5-TTS is a German speech synthesis model based on flow matching technology, focusing on generating smooth and faithful speech output.
Speech Synthesis Supports Multiple Languages
F
marduk-ra
577
26
Vits Eng
MIT
An English text-to-speech model based on the VITS architecture, trained by Kakao Enterprise, supporting high-quality speech synthesis
Speech Synthesis Transformers English
V
BricksDisplay
28
4
Musicgen Melody Large
MusicGen is a text-to-music generation model developed by Meta AI, capable of producing high-quality music samples based on text descriptions or audio prompts.
Audio Generation Transformers
M
facebook
1,414
29
Harry Styles E150 S6600
This is a voice conversion model based on RVC (Retrieval-based Voice Conversion) technology, capable of transforming input audio into Harry Styles' distinctive vocal style.
Speech Synthesis Transformers
H
sail-rvc
1,659
0
Taylor Swift RVC V1
This is a voice conversion model based on RVC (Retrieval-based Voice Conversion) technology, capable of transforming input audio into Taylor Swift-style speech.
Speech Synthesis Transformers
T
sail-rvc
4,540
0
Michaeljackson
This is a voice conversion model based on RVC (Retrieval-based Voice Conversion) technology, capable of transforming input audio into Michael Jackson-style speech.
Speech Synthesis Transformers
M
sail-rvc
6,250
0
Dua Lipa E1590 S28620
This is a voice conversion model based on RVC (Retrieval-Based Voice Conversion) technology, capable of transforming input audio into speech with a specific style.
Speech Synthesis Transformers
D
sail-rvc
1,944
0
BLACKPINK JISOO RVC V1
This is a voice conversion model based on RVC (Retrieval-based Voice Conversion) technology, specifically designed to transform input audio into the vocal style of BLACKPINK member JISOO.
Speech Synthesis Transformers
B
sail-rvc
1,000
0
Musicgen Medium
MusicGen is a text-to-music model that generates high-quality music samples based on text descriptions or audio prompts, utilizing a 1.5-billion-parameter autoregressive Transformer architecture.
Audio Generation Transformers
M
facebook
1.5M
118
Bark
MIT
Bark is a Transformer-based text-to-audio model created by Suno, capable of generating highly realistic multilingual speech, music, background noise, and simple sound effects.
Speech Synthesis Transformers Supports Multiple Languages
B
suno
35.72k
1,326
Speecht5 Tts
MIT
A SpeechT5 speech synthesis (text-to-speech) model fine-tuned on the LibriTTS dataset, supporting high-quality text-to-speech conversion.
Speech Synthesis Transformers
S
microsoft
113.83k
760
Kan Bayashi Ljspeech Joint Finetune Conformer Fastspeech2 Hifigan
This is a text-to-speech (TTS) model based on ESPnet2, trained using the LJSpeech dataset, combining Conformer, FastSpeech2, and HiFi-GAN architectures.
Speech Synthesis English
K
espnet
20
16
Convtasnet Libri2Mix Sepclean 16k
This is a ConvTasNet model trained based on the Asteroid framework, specifically designed for audio separation tasks, trained on the sep_clean task of the Libri2Mix dataset.
Sound Separation
C
JorisCos
13.38k
2
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase